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UKRAINIAN DACTYL ALPHABET GESTURE RECOGNITION USING 
CONVOLUTIONAL NEURAL NETWORKS 
WITH 3D CONVOLUTIONS 


3allponoHoBaHa TexHOMOTiA, pospoOseHa 3a JOMOMOFO!O KpOciaTPOpMeHHX 3acoOiB, JIA MOJeKOBAHHA 2KeC- 
TIB yKpaiHCbKOI JaKTWIbHOI aOeTKH, aHIMallli MepexoiB MDK CTaHaMM 2%KeCTOBHX OJMHUIb Ta KOMOIHYBaHHA 2KECTIB 
(cB). TexHONOTIA BINTBOPIOE MOCIIZOBHICTb XKECTIB 3a JOMOMOTOIO BIPTyaJIbHOi MPOCTOpOBOi Moses pyKH Ta BAKOHYE 
PO3II3HaBaHHA JaKTWJIeM 13 BXiHOrO HOTOKYy KaMepH 3a JOMOMOro!0 HaByeHoi Ha 3i0paHomy HaOopi 300paxKeHb 3ropT- 
KOBOi HeHpOHHO! Mepexi, 13 B3ATOIO 3a OCHOBy apxiTeKTyporo MobileNetv3, Ta 3 miqiOpaHoro ONTHMasIbHO!O KoHpiry- 
palliero WapiB Ta HapaMeTpis Mepexi. Ha 3i0paHomy TecTyBasIbHoMy HaOopi AaHUx JOCATHYTO TOYHOCTI y MOHaT 98%. 

Karo4oBi c10Ba: KpocraTPOpMeHicTb, MOBa %KECTIB, MOJCIOBAHHA aKTHJIeM, PO3Ii3HaBaHHA aKTHJIeM, 
3FOPTKOB1 HelipoHHi Mepexi, mobilenet 


The technology, which is implemented with cross platform tools, is proposed for modeling of gesture units of 
sign language, animation between states of gesture units with a combination of gestures (words). Implemented techno- 
logy simulates sequence of gestures using virtual spatial hand model and performs recognition of dactyl items from ca- 
mera input using trained on collected training dataset set convolutional neural network, based on the MobileNetv3 
architecture, and with the optimal configuration of layers and network parameters. On the collected test dataset accuracy 
of over 98% is achieved. 

Keywords: cross platform, sing language, dactyl modeling, dactyl recognition, convolutional neural net-works, 


mobilenet 


Introduction 

Gesture based communication is one of 
real methods for data transition, close by with 
content and discourse. Sings can be utilized to 
define explicit letters, words, states and can 
be handled, encoded and put away in a dif- 
ferent ways. Building up a technology for sto- 
ring, modeling and demonstrating signs and 
communications via gestures is a challenging 
issue because of contrasts in accessible plat- 
forms. Different platforms have different wor- 
king operating systems, (for example, mobile 
- 10S, Android, desktop - MacOS, Linux, 
Windows, and web - ChromeOS, and so 
forth), which infers diverse execution level 
and requires porting the codebase on every 
stage; some platforms require web connecti- 
on, (for example, distributed computing tech- 
nologies [1]) and others don't, and so forth. 


94 


Displaying such a technology for sing langu- 
age is a real issue for individuals with hearing 
disabilities and their relatives, yet in addition 
is significant in a more extensive usage, due 
to universality of sing language. 

Cross-platform development [2] give an 
approach to beat this issue. Cross-platform 
development can be utilized instead of virtual- 
machines [3] or a lot of mono-platforms deve- 
lopment. Utilizing these advances permits to 
build up a single codebase for various sort of 
platforms, types of CPU, operating systems of 
equipment execution and to send it on all plat- 
forms consistently. 

In this article an answer for the issue of 
sing language demonstrating is proposed de- 
pendent on cross-platform development. The 
technology of communication through signs 
can be adaptable and balanced, depending on 
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the equipment it works on or dependent on 
accessibility of internet connection. The pro- 
posed methodology tunes the 3D hand model 
(parameters, for example, the quantity of po- 
lygons for rendering the hand and the step of 
sings progress) in view of the CPU type, mea- 
sure of accessible memory and web connec- 
tion speed. The sing recognition is additi- 
onally performed utilizing cross-platform de- 
velopments and can be altered for the trade- 
off in model size and execution speed. The 
sing (gesture) modeling and recognition is a 
part of a single gesture communication tech- 
nology and this paper is a further develop- 
ment of author's previous works [4], [5]. 

Existing approaches for recognition 
of sign language 

Detection of hand gestures can be consi- 
dered as a type of task of object detection, 
which has a set of mature and novel approa- 
ches in both classic computer vision and deep 
learning, with convolutions neural networks 
specifically. 

Since release of the convolutional neu- 
ral network architecture for ImageNet contest, 
AlexNet [6], this new approach proved to 
show robust results in different condition of 
input data. Neural networks show robust re- 
cognition quality for object detections, when 
object have diverse distortions, different scale 
and various light conditions, noise, blur. 

One of the key idea in transferring from 
Static object detection into dynamic object de- 
tection is to use multiple subsequent frames 
from the video input instead of a single image, 
in order to utilize additional temporal data 
among with spatial data. 

As bigger datasets with recorded activi- 
ties were released (Sports-1M [7], Kinetics 
[8], Jester [9]), convolutional neural networks 
with 3-dimensional convolutions because suc- 
cessful. The size of the dataset allowed to 
train the model without overfitting [10]. 

Gestures of sign language were detected 
using different approaches based on classic 
computer vision with hand-crafted features 
such as orientation of histograms [11], histo- 
gram of oriented gradients (HOG) [12] or 
bag-of-features [13]. Although the state of the 
art hand gesture recognition architectures are 
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based on CNNs [14, 15, 16], similar to other 
computer vision tasks. 

Commonly in order to achieve higher 
performance in terms of accuracy on the ges- 
ture dataset, the architecture of the CNN was 
made more complex [17, 18]. 

However, the proposed technology in 
this paper aims to work cross-platform, an a 
various set of platforms and devices, some of 
them, such as hand-held devices (smartphones 
and tables) have limited computational resour- 
ces and capabilities. Thus, research of existing 
approaches among CNN was accented on 
lightweight architectures which show satisfy- 
ing performance on mobile cpus, such as 
SqueezeNet [19], MobileNet [20], 
MobileNetV2 [21], ShuffleNet [22] and 
ShuffleNetV2 [23], MobileNetV3 [24] which 
aim to reduce computational cost but still keep 
the accuracy high. In our work, we have used 
the 2D and 3D versions of MobileNetV3. 

Problem statement 

The proposed technology should consist 
of two parts, which are sign language [25] 
modeling and gesture recognition module. 
Both modules should be able to run without 
codebase modification on multiple platforms 
and should be developed using cross-platform 
tools. 

Gesture recognition module should con- 
sist of a model which is able to detect and 
identify the gesture, specified by the user, 
from a camera input. Set of gestures is limited 
by the Ukrainian dactyl language, but can be 
extended further. An appropriate dataset of 
Ukrainian dactyl language should be collected 
for testing the model performance. The sing 
language modeling module should be able to 
reproduce a gesture specified by a set of para- 
meters, stored in a database, and should be li- 
mited by a set of Ukrainian dactyl language 
signs, but can be extended further with other 
languages. 

The gesture recognition module should 
utilize the model which show robust and state- 
of-the-art performance along with high effici- 
ency in terms of computational resources in 
order to achieve high accuracy and FPS-rate on 
various platforms, using cross-platform 
technologies. 
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Proposed approach 

To developed a technology for Ukrai- 
nian dactyl language modeling and recogni- 
tion, which can run on multiple platforms, 
without changing the codebase, an approach 
based on cross-platform tools is proposed. 
Gesture modeling module should consist of a 
virtual three dimensional hand model and a 
user interface, which should provide the user 
with ability to specify a symbol or a set of 
symbols, which then will be transitioned as a 
sequence of gestures. To implement both hand 
model and user interface, a cross-platform 
framework Unity3D [26] was used. Compa- 
ring to other 3D engines, it provides a unified 
development process for all available plat- 
forms (mobile, desktop and web) and provi- 
des a seamless way to deploy the application 
on all of them without changing the codebase. 
To develop a gesture recognition module, a 
cross-platform framework Tensorflow [27] is 
proposed. This approach based on cross- 
platform framework for machine learning 
allows to developed and train a gesture recog- 
nition model once, and then deploy it on mul- 
tiple platforms (mobile, desktop and web) 
without any modifications to the model or the 
code for training. As a model architecture, the 
MobileNet architecture is considered, enhan- 
ced with 3D convolutions, to take into 
account temporal information from a 
sequence of input frames from the camera. 
Altogether, the proposed technology novelty 
is that it's a unified cross-platform technology 
for Ukrainian dactyl language modeling and 
recognition, with improved MobileNet archi- 
tectture for improved recognition of the 
Ukrainian dactyl alphabet. 

Gesture recognition 

Gesture recognition, as a part of cross- 
platform technology for Ukrainian dactyl lan- 
guage modeling and recognition, should be 
implemented using cross-platform tools. Ges- 
ture recognition approach depend on the type 
of input information they work with. In case 
of 3D model bases algorithms or skeletal- 
based algorithms, the approach can use volu- 
metric or skeletal model, or a combination of 
them. Although, these approaches tend to be 
computationally expensive and require additi- 
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onal hardware from user. Other type of appro- 
aches, appearance-based models derive para- 
meters directly from the image or a sequence 
of images (in case video is used as an input). 
AS a next step some pattern mining technique 
or machine learning approach is used to train 
a recognition model. Due to no need in additi- 
onal hardware apart from a simple webca- 
mera, these type of approaches were selected 
for the cross-platform technology. Some ap- 
proaches, for example, Ong et al. [28] propo- 
ses Sequential Pattern Mining in order to de- 
tect signs based on the tree structures. 
Convolutional Neural Networks (CNN) 
is a class of deep neural networks which are 
regularized versions of multilayer percep- 
trons, most commonly applied to analyzing 
images and videos. CNNs are especially good 
at analyzing images due to ability to take into 
account locality reference of the data in the 
image (typically nearby samples at some in- 
put data are not related, which is not true in 
case of an image). Therefore, CNN show 
state-of-the-art results in image classification 
and recognition tasks [29], [30]. Another be- 
nefit of the convolutional neural networks is 
no need in hand-crafted features, unlike con- 
ventional pattern matching algorithms. The 
process of training takes the input data and 
finds all the features needed for recognition 
and stores them as weights of the model. 
CNNs are robust at the task of classification 
or recognition of the object on an image, inde- 
pendent of input image scale, lightning condi- 
tions, occlusions, noise, etc. Although training 
such a model requires a sufficient dataset. 
Typically architecture of the CNN consists of 
a set of convolutional, pooling and ReLU la- 
yers. Tensorflow framework provides a cross- 
platform and performance-efficient imple- 
mentation of convolutional neural networks. 
Gesture recognition with MobileNet 
MobileNetV3 architecture (Figure 1) is a 
new mobile architecture, development of the 
MobileNet model. MobileNetV3 extends its 
predecessor with 2 main ideas. Residual blocks 
connect the beginning and end of a convolutio- 
nal block with a skip connection. By adding 
these two states the network has the opportuni- 
ty of accessing earlier activations that weren’t 
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modified in the convolutional block. This ap- 
proach turned out to be essential in order to 
build networks of great depth. On the other 
hand, MobileNetV3 follows a narrow->wide- 
>narrow approach. The first step widens the 
network using a 1x1 convolution because the 
following 3x3 depthwise convolution already 
greatly reduces the number of parameters. 
Afterwards another 1x1 convolution squeezes 
the network in order to match the initial num- 
ber of channels. A factor of 6 opposed to the 4 
in our example. c represents the number of in- 
put channels and n how often the block is re- 
peated. Lastly, s tells whether the first repeti- 
tion of a block used a stride of 2 for the down- 
sampling process. This is a common assembly 
of convolutional blocks. 


Input Operator exp size | #out | SE | NL | s 
224? x 3 conv2d, 3x3 - 16 - HS | 2 
112? x 16 bneck, 3x3 16 16 v | RE|2 
56? x 16 bneck, 3x3 72 24 - | RE | 2 
28? x 24 bneck, 3x3 88 24 - RE | 1 
28 x 24 bneck, 5x5 96 40 v | HS |2 
14? x 40 bneck, 5x5 240 40 v | HS] 1 
14? x 40 bneck, 5x5 240 40 v | HS} 1 
14? x 40 bneck, 5x5 120 48 v | HS} 1 
14? x 48 bneck, 5x5 144 48 ¥v | HS} 1 
14? x 48 bneck, 5x5 288 96 v | HS |2 
7? x 96 bneck, 5x5 576 96 v | HS} 1 
7? x 96 bneck, 5x5 576 96 ¥v | HS} 1 
7? x 96 conv2d, 1x1 - 576 | ¥v | HS | 1 
7? x 576 pool, 7x7 - - - - |i 
1° x 576 | conv2d 1x1, NBN - 1024 - HS | 1 
1? x 1024 | conv2d 1x1, NBN - k - - |i 


Fig. 1. Architecture of MobileNetv3. 


Network Improvements have been made 
in two ways: Layer removal (1) and swish 
non-linearity (2). 

In the last block, the 1x1 expansion la- 
yer taken from the Inverted Residual Unit 
from MobileNetV2 is moved past the pooling 
layer. This means the 1x1 layer works on fea- 
ture maps of size 1x1 instead of 7x7 making it 
efficient in terms of computation and latency. 

We know that the expansion layer takes 
a lot of computation. But now that it is moved 
behind a pooling layer, we don’t need to do the 
compression done by projection layer from the 
last layer from the previous block. Thus we 
can remove that projection layer and the filte- 
ring layer from the previous bottleneck layer 
(block). 
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Fig. 2. Pipeline of 3D-convolutions 
with MobileNetv3. 


Ukrainian dactyl alphabet dataset 
collections for recognition with MobileNet 

Since training of the Convolution Neu- 
ral Network hardly depends on a big and di- 
verse dataset, to achieve a high enough accu- 
racy metrics level, dataset of Ukrainian dactyl 
language letters with diverse characteristics 
was collected. Each gesture consists of 1500 
sample images, and 50 different people hands 
were showing gestures, with distribution of 
70% male and 30% female hands. Different 
light conditions were used (with distribution 
of 20 % images in bad light conditions, 30% 
in mediocre light conditions and 50% in good 
light conditions). About 10% of images were 
distorted with noise and blur. Overall ~50,000 
original images were collected as a training 
dataset. After applying additional dataset aug- 
mentation techniques (such as rotation, ran- 
dom crop, mirroring etc.) the final dataset be- 
came about 150,000 images. For testing pur- 
poses a fraction of 10% of the dataset was se- 
lected, making final training dataset of 
135,000 images and final testing dataset of 
15,000 images. 

For the training process of MobileNet 
architecture based Convolutional Neural Net- 
work for the task of gesture recognition of 
Ukrainian dactyl alphabet gestures an appro- 
priate dataset should have been collected, due 
to no available datasets for Ukrainian sign 
language in free access. A specific software 
was developed for recording a short video se- 
quences of Ukrainian dactyl alphabet gestures 
shown by different people. Since the recor- 
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ding software isn’t direct part of the proposed 
technology, but rather a helper tool, it was 
developed only under Windows family of 
operating systems, using C# programing lan- 
guage and .NET framework. The pipeline of 
recording a single entry looks like this: 

- The person sits in front of the webcam, 

connected to the recording software; 

- The person needs to put one’s hand into the 

region of interest of the recording software; 

- The person shows specific gesture from the 

Ukrainian dactyl alphabet; 
- The recording operator starts the recording; 
- The person showing the gesture starts to 
smoothly move the hand across different 
axis’S; 

- After video of appropriate length was re- 

corded, the operator stops the recording; 

- The process goes on with the next gesture. 
MobileNetv3 with 3D convolutions 
Figure 2 shows the pipeline with spatio- 

temporal modeling approach used for 2D CNN 
models. Features of each 8 frames are extrac- 
ted using the same 2D CNN and concatenated 
keeping their order intact. Afterwards, two le- 
vels of fully connected (fc) layers are applied 
in order to get class-conditional probability 
scores. The reason behind is that fc layers can 
organically infer the temporal relations, with- 
out knowing it is a sequence at all. The size of 
features 2D CNNs extracts is 64 for each 
frame. With the first fc layer, feature dimen- 
sion is reduced from 64x8=512 to 256. With 
the second fc layer, dimension is reduced to 
the number of classes. 


Layer / Stride Repeat Output size 
Input clip cx8x112x 112 
Conv1(3 x 3 x3)/s(1,2,2) 1 32x8x56x56 
Block/s(1,1,1) l 16x8x56x56 
Block/s(1,2,2) 2 24x8 x28 x28 
Block/s(2,2,2) 3 32x4« 14x14 
Block/s(2,2,2) 4 64x2x7x7 
Block/s(1,1,1) 3 96x2x7x7 
Block/s(2,2,2) 3 160x1x1x1 
Block/s(1,1,1) 1 320x1x1x1 
Conv(1 x 1 x 1)/s(1,1,1) | 1280x1x1x1 
Linear(1280x NumCls) ] NumCls 


Fig. 3. Architecture of MobileNetv3 
with 3D-convolutions. 
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On the other hand, 3D CNNs contains 
spatiotemporal modeling intrinsically and does 
not require an extra mechanism. We have in- 
flated SqueezeNet and MobileNetV3 such that 
they accept 8 frames as input. The details of 
the 3D-MobileNetV3 are given in Figure 3. 

Gesture recognition experiment 

Standard techniques of fighting overfit- 
ting of the neural network were applied on 
each training. 

During the training process’ of 
MobileNet architecture based Convolutional 
Neural Network multiple architecture modifi- 
cations were set up in order to find the best 
trade-off in number of layers to accuracy. At 
some point the accuracy of the trained model 
stopped increasing, so the obtained architecture 
was decided as optimal in terms of the smallest 
architecture with best accuracy which is shown 
in Figure 4 (macro average fl-score and confu- 
sion matrix). 


Ukrainian dactyl language recognition MobileNet confusion matrix 


Fig. 4. Confusion matrix of the optimal 
architecture model. 


Conclusions 

The proposed technology consists of two 
main modules: gesture modeling and gesture 
recognition modules, which use the database 
with gestures specifications stored in YAML 
format in a PostgreSQL [31] database. 

The proposed technology implements 
gesture modeling and gesture recognition for 
Ukrainian dactyl alphabet gestures with cross- 
platform development tools. Gesture modeling 
was implemented using Unity3D framework, 
which is cross-platform and shows satisfying 
performance on different platforms (mobile, 
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web and desktop) while rendering a realistic 
three-dimensional hand model. Number of po- 
lygons and animation step of gesture transiti- 
ons can be adjusted for the sake of performance. 

A dataset of more than 50.000 images 
was collected using diverse conditions and dif- 
ferent persons hands. The dataset was aug- 
mented using specific techniques and final 
dataset consists of 150.000 images. Gesture re- 
cognition module was implemented using 
Tensorflow framework, which provides ability 
to deploy its model on different platforms 
without any codebase modifications. As a mo- 
del for gesture recognition, MobileNet archi- 
tecture was chosen, as a model with best trade- 
off of size and accuracy, especially on low per- 
formance platforms (such as mobile and web). 
The model was trained on the collected Ukrai- 
nian dactyl language dataset. Due to augmenta- 
tions, the model showed state-of-the-art level 
of performance. Based on experiments, opti- 
mal model architecture was chosen in order to 
keep the best performance level with the least 
model size possible. According experiments 
results were shown. The performance of CNN 
model was compared to other approaches and 
showed similar or superior values. 

The proposed gesture communication 
technology can be further augmented with 
other gestures and languages and with other 
cross-platform modules. 
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PE3SIOME 


C.C. Konapariok 

PosnisHaBaHHA *KeCTIB YKpaiHCbKol 
WAKTHJIbHO! aGeTKH 3a JOMOMOLOIO 3rOpT- 
KOBHX HelipOHHHX Mepex i3 TPHBUMIpHOrO 
3rOpTKOr0 

Moga 2KecTiB € OJHHMM 13 OCHOBHHX 3aco- 
6iB Mepeyzaui indopmaii, MopAy, 13 TeKCTOM 1 
MOBOIO. AK IpaBHJIO, Y KO2KHOI KpalHH € CBOA 
piqHa MOBa 2KeCTIB, pore HalleBHO, HEBIJOMO, 
CKUIbKH MOB %KeCTIB iCHY€ Y BCBOMY CBITI. 
YxkpaiHcbKa MOBa 2KeCTIB Ta yKpaiHCbKHit aJi- 
(baBiT WakTHIeM € OJHMMN 13 HaliMommMpeHi- 
WImMx 3aco0iB cIliIkyBaHHA B YkpaiHi Mica 
TEKCTOBOFO Ta PO3MOBHOTO CIIKYBaHHA. 

HayaHHa TexHONOril BHBYCHHA KecTIB 
(3HaKiB, aKTHJIeEM) yKpalHCbKOI MOBM >KeCTIB 
JWIA TaKOI CIJIbHOTH € aKTYyaJIbHOIO mpoOsie- 
MOI!O Ta CKJIaHUM 3aBJ,JaHHAM. 

Jia BHpieHHA 3aa4i MOevIIOBAaHHA 
MOBH >KE€CTIB Ta BHKOH@HHA aHiMallii 2KeCTO- 
BUX CTPyKTyp 3a JOMOMOrorO MmpocTopoBoi 
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BIPTYAJIbHOi MOJ{eJI1 PyKH MIPpOMOHYeTbCA Kpoc- 
TWIaTPOpMHa TeXHOJOFiA, 3ACHOBaHA Ha Kpoc- 
miatpopMenilt OidmiotTeri Unity3D. Kpoc- 
IuiatpopMena Oi0sioreka Unity3D tTaxox Bu- 
KOPHCTOBYETECA JIA iHTepdelicy KopuctyBa- 
ya, TCXHOJIOMA peasi30BaHa 3a OMOMOrorO 
MOBH TIporpamyBaHHa C#. 3armponmoHoBaHi 1H- 
CIPYMeCHTH MOXYTb BUpPILWHTH WpoOsemy 3a- 
Ilycky TexHOJOrli Ha J[€KiJIbKOX ICHYIOUNX 
TuiaTpopMax. Hosv3Ha 3alpomoHoBaHoi Tex- 
Horii MOAra€ B TOMY, WJO BOHa € KpociaT- 
(bopMeHoro Ta Ma€ HacTporoBaHuli piBeHb T0- 
JUPOHIB JIA TPHBUMIPHO! MOJeI PyKH Ta KpOK 
aHIMallii JI TepeXOJiB 2KeCTIB. 

Mojgesb pyku, BOyqOBaHa B MOJLyJIb MO- 
JICIIOBaHHA 2%KeCTIB, Mae 27 KICTOK, KO2%KHAa 
KicTKa 3'€]HaHa 3 1HINOFO 4epe3 pi3Hi THM 
cyr00iB. AK OCHOBHY, BUKOPHCTaHO TeXHOJIO- 
TiKO MOJeIOBAHHA TPHBUMIPHO! MOJ{eJ pyKu 
Ta aHiMallli 2%ecTIB MbK Mopdemamu. Bona 
34aTHa e(eCKTHBHO BIATBOpHTH pealicTH4uHy 
MOJ{eJIb PYKH, WO CKayaeTbca 3-10Ha 70 000 
TIOJMPOHIB. 

Mogyii posii3HaBaHHA %KeCcTIB, po3po0- 
JIeHi 3a JOMOMOroro KpociiaTdopMeHHXx 1HCT- 
PYMeHTiB (3acHOBaHi Ha Python, C ++), Mo- 
%KYTb OyTH BOYAOBaHi B in:opMalliitHy TexHo- 
woriro. KOHBOJOMIMHI HeMipOHHi Mepexki 1oKa- 
3aJIM HaiMHl pe3syIbTaTH B 3ajjayax 3 posIli3- 
HaBaHHA 13 300paxKeHb Ta xecTiB. J[a eKcrie- 
puMenty Oys 3i0paHuii Hadip WaHux 3 WaKTH- 
JICEMaMH YKpaiHCbKO! MOBH. KooKeH 2KeCT CKJIa- 
maetaca 3 1000 3pa3KoBux 300paxeHb, 50 pi3- 
HUX Joe MOKa3yBasIM %KeCTH, 3 PO3TIOAIOM 
70% YomoBianx Ta 30% x*iHo“HX pyK. by 
BHKOPHCTaH! pi3Hi YMOBH OCBITIICHHA (3 PO3- 
TonIOM 20% 300paxeHb y ToraHux, 30% y 
TocepewHix Ta 50% mpH xopomwimMx yMoBax 
ocBiTIeHHA), 10% 300paxkeHb Oy crOTBOpe- 
Hi LIYMOM Ta poO3MHTTAM. 

Apxitextypa MobileNetv3 Oysa BuKko- 
puctaHa AK OCHOBa JIA apxirektypu CNN. 
Jia NOKpallleHHA AKOCTI posii3HaBaHHA Oys10 
BHKOPHCTaHO TPHBUMIPHYy 3FOpTKy J[eKWIbKOX 
TIOCJIOBHUX KajipiB 13 2%KeCTaMH. 

Ii napyanna TpuBasto ~ 300 000 irepa- 
IH, WO CTaHOBUTb MpHOsM3HO 12 erox, 1 W0- 
carHyTo ~98% TOYHOCTI Ha TeCTyBaJIBHOMY 
HaOopi aHHx. 
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